C++ Library for Easy Command-Line Parsing
      by John M. Dlugosz


I've always felt that the argv[] array was difficult to use.  Not bad,
just _primitive_.  If all you have are a couple arguments, it is not
too hard.  But you still have to check for the correct count and convert
each value to the proper type.

If your program has various flags and switches, things can get much more
difficult.  How many programs have you
written and suffered through the argument processing?  In how many programs
have you _wished_ you had a better way?  In my case, I've written many
simple programs that could benefit from command line arguments, but found it
more trouble than it was worth.  So I was stuck with a simpler, less
flexible program.  For test code and such, I would even change a value and
recompile, instead of adding a nice command line processing.

Now, I do have a simple way.  It has revolutionized the way I write small
programs.  Rich command line argument processing, sign-on messages, and
help on usage are now trivial.

Here is an example.  Consider a program that takes a `-v' switch for
verbose mode.  Using this library, this is accomplished by including the
definition

    cmdl_flag v ('v', "requests verbose mode");

to make the program recognize the flag, and code such as

    if (v()) {  //do this in verbose mode
       //whatever...
       }

to respond to the state of this flag.  There is no messy
string manipulation, error checking, or anything.  The library automatically
handles `-v' or `/v' forms, disabling a switch with `-v-', cascading
switches such as `-vbx', and other features.  

Notice the definition of `v' above takes two constructor arguments.
The second argument is a string that provides usage information.  The library
will automatically generate the usage message, collecting the messages from
all the parameters in the program.

Concepts
--------

The basic idea is to model command-line parameters as program arguments.
That is, they should be analogous to arguments passed to a function.
In a function call, each value passed is bound to a name in the called
function.  By analogy, a program argument is a name which gets bound to
something which can be specified on the command line.  To provide for
command line input, you declare those arguments you want to receive, along
with their types.

The cmdl library has a type for each type of command line parameter:  flags,
integers, strings (more can be added).

The constructor is given the name of the parameter, as used on the command
line.  It can also be given a help string, and flags.  Here are some
examples:

 typedef cmdl_flag flag;
 flag v ('v', "requests verbose mode");
 flag s ('s', "specifies alternate algorithm");
 flag T ('T', "prevents the foobar from clearing (debugging)"  ,cmdl::once);
 cmdl_string pos1 ((char*)0, "first positional parameter", cmdl::required);
 cmdl_string pos2 ((char*)0, "second positional parameter");
 cmdl_string pos3 ((char*)0, "third positional parameter");
 cmdl_int count ('c', "iteration count");
 cmdl_help helper;

This shows the following types:

* Type cmdl_flag is a simple switch.  Using that flag makes the parameter
TRUE, if absent it is FALSE.  You can also turn off the switch by using
the name with a trailing `-' sign.  (The library takes care of
cascading switches, too.)

* Type cmdl_string allows input of an arbitrary string.  The syntax is
somewhat flexible, with the argument separated from the keyword by a space
or an `=', and the string can be in quotes.

* Type cmdl_int allows input of an integer.  The input is checked for valid
syntax.

* Type cmdl_help provides for an automatically generated help screen if the
command line is empty, or with the `-?' switch.

Except for the special cmdl_help class, the constructors take two or
three arguments.  The first is the name of the command-line parameter.
This can be given as a single char or as a string.
If passed `(char*)0', there will be no name and it is taken to be a
positional parameter, explained later.

The second constructor argument is the usage help string.

The optional third argument to the constructors is a bank of flags.
`once' indicates that the argument can only appear once in the command line.
Ordinarily, repeating it will override the previous mention.
The `required' flag means that it is an error to omit the parameter.
There are others, detailed in the code listing.

A flag worth particular attention is `keyword'.  If present, then the
command-line parameter name will not use the switchchar ('-' or '/') to
indicate that this is a parameter.  If a keyword is found anyplace outside
of a quoted string it will be used as an instance of the parameter.

Using class cmdl in a program
-----------------------------

The program that contains these definitions will kick off everything by
calling `cmdl::parseit();'.

This works because the constructor for each command-line argument class
linked them together into a linked list.  The command-line argument objects
should be global, or defined in `main' before calling `parseit()'.  In
any case, no commnand-line object should ever go out of scope before
`parseit()' is called.

Because the objects link themselves up, the complete collection of
defined command line parameters is known.  `parseit()' will parse the
command line, and compare what it finds with the list of possible arguments.
It takes care of usage errors and such, so the program aborts if the
command line is invalid.  No error checking is required by the main
program.

Each command-line-parameter object contains an `operator()' which provides
a succinct way to get the value of that parameter.  There is a default value
in case it was not specified on the command line.  If you would rather check
for its presence, use the `hasvalue()' member.

Before calling `parseit()', you can use the static member `signon()' to
note a string used during the usage help message.

See the listing of TEST3.CPP and other files for usage of the examples
described above.

Kinds of argument names: char, string, and positional
-----------------------------------------------------

The first argument to the constructor of the command-line parameter
objects is the name of the parameter that will be used on the command line.
The constructor has two forms.  It can take a char, used for a single letter
switch.  Or it can take a string (char*), for arbitrary names.

In addition, the string form responds to a special name of NULL.  Passing
in `(char*)0' for the name makes it a positional parameter.  The parser
will not assign it based on a name.  Instead, it is used for unnamed
parameters.

If a parameter does not start with a '-' or '/', and it does not match
the name of a keyword parameter (those that don't use the '-'), it is
taken to be a positional parameter.  It is assigned to the first unused
positional parameter you defined.  This lets you mix switches with
non-named parameters such as filenames.

Note that positional parameters can be flagged as `required'.

The Use of C++
--------------

A few C++ language concepts may need explaining.

Note the syntax of the flags in the third constructor argument.
        cmdl::required | cmdl::keyword
The names here are enumeration constants.  They are created with an
`enum' definition (see CMDL.H, line 50).  The names are defined within
the class, and are in the scope of the class.  They are not global, and
don't pollute the global namespace.  So, you have no conflict with
a name `keyword' used elsewhere in the program, for example.  The downside
of this is that you qualify the name with its classname, as shown.  Note
that in C, you probably would have seen `CMDL_KEYWORD' instead--- the name
would contain its "family" identifier as part of itself.  So it really is
not additional typing to use class-scoped names like these.

The enumeration constants are given explicit values as powers of 2, so
they behave as flags which can be combined with | or +.  The function's
parameter taking the flags are defined as unsigned, not as an enum
type (in fact, the enum type has no name.  It just defines the constants).
This is necessary because the result of | or + is an int, not an enum type.

The class contains two definitions of enum names for flags that share
the same flags variables.  But some are public and some are protected.

Another interesting feature is the use of `operator()'.  See CMDL.H lines
84, 97, and 109.  The operator is defined with the name `operator()' which
is then followed by the parameter list.  Here it has no parameters, so you
see two sets of ()'s in a row.  The operator is invoked by following the
object name with the parameter list, as shown in the test programs.


The positional parameter ability requires you to pass `(char*)0' instead of
just 0 because 0 is ambiguous--- 0 can be a char '\0' or a null pointer.


The Parser
----------

The core parser code breaks up the command line into tokens and looks up
names of parameters.  The value of parameters is sent to the matched object
for conversion to the proper type.  The virtual `scan()' function does
this final part.  An earlier version of this library had a seemingly more
flexible system that allowed significant customization in the specific
parameter type's code.  However, it proved too clumsy and was never really
used.  This points out a good design philosophy:  Make a thing just flexible
enough.  If it is too configurable, it can become as difficult, or more so,
to use as writing code each time; which is exactly what the library is
supposed to avoid.

The parser uses a class cmdlscan for low level character manipulation and
tokenizing.  It is planned to give this more power in the future, for better
error reporting.  Some of the implementation details are implemented as they
are for that reason.  The need for a parser class was indicated because
several related values, including the string and its current scan position,
were always being passed together.  When things like this happen, think about
combining them into an object.


Error and Report Output
-----------------------

I did not want the code to simply use `cerr' and `cout' for output.
This may be used in programs that have their own idea of I/O, including
programs that run in graphics mode.  For maximum flexibility, all
output is separated.  The final results are funneled through a pair of
functions called `cmdl::output()', both defined in OUTPUT.CPP.  If linking
in CMDL.LIB, you can supply your own versions of these two functions to
handle output your way, without having to recompile the cmdl library.